Ambiguss, a game for building a Sense Annotated Corpus for French
نویسندگان
چکیده
Evaluating a WSD task is a challenge at least as difficult as developing the task itself. Manually constructing a corpus of ambiguous sentences is a difficult and tedious task. Such a corpus is even more complex to produce if in each sentence, each ambiguous word is associated with its correct meaning. In addition to finding/imagining the sentence, it is necessary to have a word sense lexicon to associate the correct meaning with each ambiguous word, and such a resource may not be available in any given language. Secondly, referring to a given meaning might be tricky. This can be done by associating a meaning number, but in such a case, the lexicon must be provided along with the corpus. Another method would be to represent the correct meaning of the ambiguous word by a gloss, i.e. a word or group of words that intuitively refers to its correct meaning. A classic example could be bank > river and bank > money; the glosses river and money refer to two possible meanings of the word bank. Anyway, to define the glosses, i.e. to choose the word that best illustrates the meaning of a word is another pitfall: Indeed, it seems that there is not always a strong agreement between people for such a task.
منابع مشابه
Building an Annotated English-Vietnamese Parallel Corpus for Training Vietnamese-related NLPs
In NLP (Natural Language Processing) tasks, the highest difficulty which computers had to face with, is the built-in ambiguity of Natural Languages. To disambiguate it, formerly, they based on human-devised rules. Building such a complete rule-set is time-consuming and labor-intensive task whilst it doesn’t cover all the cases. Besides, when the scale of system increases, it is very difficult t...
متن کاملPOS-Tagger for English-Vietnamese Bilingual Corpus
Corpus-based Natural Language Processing (NLP) tasks for such popular languages as English, French, etc. have been well studied with satisfactory achievements. In contrast, corpus-based NLP tasks for unpopular languages (e.g. Vietnamese) are at a deadlock due to absence of annotated training data for these languages. Furthermore, hand-annotation of even reasonably well-determined features such ...
متن کاملBuilding Chinese Sense Annotated Corpus with the Help of Software Tools
This paper presents the building procedure of a Chinese sense annotated corpus. A set of software tools is designed to help human annotator to accelerate the annotation speed and keep the consistency. The software tools include 1) a tagger for word segmentation and POS tagging, 2) an annotating interface responsible for the sense describing in the lexicon and sense annotating in the corpus, 3) ...
متن کاملA new semantically annotated corpus with syntactic-semantic and cross-lingual senses
In this article, we describe a new sense-tagged corpus for Word Sense Disambiguation. The corpus is constituted of instances of 20 French polysemous verbs. Each verb instance is annotated with three sense labels: (1) the actual translation of the verb in the english version of this instance in a parallel corpus, (2) an entry of the verb in a computational dictionary of French (the Lexicon-Gramm...
متن کاملIntegrating lexicographic examples in a lexical network (Intégration relationnelle des exemples lexicographiques dans un réseau lexical) [in French]
This paper presents a set of lexicographic examples which is being developped along the French Lexical Network. The possibility of using this set as an annotated corpus for research on automatic Word Sense Disambiguation is examined. Mots-clés : Réseau Lexical du Français, exemples lexicographiques, corpus annoté sémantiquement.
متن کامل